由于经过验证的2D检测技术的适用性,大多数当前点云检测器都广泛采用了鸟类视图(BEV)。但是,现有方法通过简单地沿高度尺寸折叠的体素或点特征来获得BEV特征,从而导致3D空间信息的重丢失。为了减轻信息丢失,我们提出了一个基于多级特征降低降低策略的新颖点云检测网络,称为MDRNET。在MDRNET中,空间感知的维度降低(SDR)旨在在体素至BEV特征转换过程中动态关注对象的宝贵部分。此外,提出了多级空间残差(MSR),以融合BEV特征图中的多级空间信息。关于Nuscenes的广泛实验表明,该提出的方法的表现优于最新方法。该代码将在出版时提供。
translated by 谷歌翻译
在现代制造环境中,对接触式任务的需求正在迅速增长。但是,很少有传统的机器人组装技能考虑任务执行过程中的环境限制,并且大多数人将这些限制作为终止条件。在这项研究中,我们提出了基于推动的混合位置/力组装技能,该技能可以在任务执行过程中最大化环境限制。据我们所知,这是在执行程序集任务期间使用推动操作考虑的第一项工作。我们已经证明,我们的技能可以使用移动操纵器系统组装任务实验最大化环境约束的利用,并在执行中实现100 \%的成功率。
translated by 谷歌翻译
本地图像功能匹配,旨在识别图像对的识别和相应的相似区域,是计算机视觉中的重要概念。大多数现有的图像匹配方法遵循一对一的分配原则,并采用共同最近的邻居来确保跨图像之间本地特征之间的独特对应关系。但是,来自不同条件的图像可能会容纳大规模变化或观点多样性,以便一对一的分配可能在密集匹配中导致模棱两可或丢失的表示形式。在本文中,我们介绍了一种新颖的无探测器本地特征匹配方法Adamatcher,该方法首先通过轻巧的特征交互模块与密集的特征相关联,并估算了配对图像的可见面积,然后执行贴片级多到 - 一个分配可以预测匹配建议,并最终根据一对一的完善模块进行完善。广泛的实验表明,Adamatcher的表现优于固体基线,并在许多下游任务上实现最先进的结果。此外,多对一分配和一对一的完善模块可以用作其他匹配方法(例如Superglue)的改进网络,以进一步提高其性能。代码将在出版时提供。
translated by 谷歌翻译
基于伪标签的半监督学习(SSL)在原始数据利用率上取得了巨大的成功。但是,由于自我生成的人工标签中包含的噪声,其训练程序受到确认偏差的影响。此外,该模型的判断在具有广泛分布数据的现实应用程序中变得更加嘈杂。为了解决这个问题,我们提出了一种名为“班级意识的对比度半监督学习”(CCSSL)的通用方法,该方法是提高伪标签质量并增强现实环境中模型的稳健性的插手。我们的方法不是将现实世界数据视为一个联合集合,而是分别处理可靠的分布数据,并将其融合到下游任务中,并将其与图像对比度融合到下游任务中,以更好地泛化。此外,通过应用目标重新加权,我们成功地强调了清洁标签学习,并同时减少嘈杂的标签学习。尽管它很简单,但我们提出的CCSSL比标准数据集CIFAR100和STL10上的最新SSL方法具有显着的性能改进。在现实世界数据集Semi-Inat 2021上,我们将FixMatch提高了9.80%,并提高了3.18%。代码可用https://github.com/tencentyouturesearch/classification-spoomls。
translated by 谷歌翻译
迷你竞赛旨在开发强化学习和模仿学习算法,可以有效地利用人类演示,大大减少了解复杂\ emph {获取德国}任务以稀疏奖励所需的环境交互的数量。为了解决挑战,在本文中,我们呈现\ textbf {seihai},a \ textbf {s} ample-\ textbf {e} ff \ textbf {e} ff \ textbf {i} cient \ textbf {h} ierrampf {h} ierraschical \ textbf {ai},充分利用人类示范和任务结构。具体而言,我们将任务分成几个顺序相关的子任务,并使用强化学习和模仿学习培训每个子任务的合适代理。我们进一步设计了一个调度程序,为自动为不同的子任务选择不同的代理。Seihai在Neurips-2020 Minerl竞赛中初步和最终的第一名。
translated by 谷歌翻译
最近,深度多智能经纪增强学习(Marl)已经表明了解决复杂的合作任务的承诺。它的成功部分是因为代理商之间的参数共享。然而,这种共享可能导致代理人行事,并限制其协调能力。在本文中,我们的目标是在共享多智能经纪增强学习的优化和代表中引入多样性。具体而言,我们提出了一种信息理论正则化,以最大限度地提高代理商身份与其轨迹之间的相互信息,鼓励广泛的勘探和各种个性化行为。在表示中,我们将特定于代理的神经网络架构中的特定模块纳入了共享神经网络架构,这些模块由L1-Norm规则化,以促进代理之间的学习共享,同时保持必要的多样性。实证结果表明,我们的方法在谷歌研究足球和超级硬星争II微型管理任务中实现了最先进的性能。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario.
translated by 谷歌翻译
3D对象检测是自动驾驶的重要组成部分,深层神经网络(DNNS)已达到此任务的最新性能。但是,深层模型臭名昭著,因为将高置信度得分分配给分布(OOD)输入,即未从训练分布中得出的输入。检测OOD输入是具有挑战性的,对于模型的安全部署至关重要。已经针对分类任务进行了广泛研究OOD检测,但是它尚未对对象检测任务,特别是基于激光雷达的3D对象检测的注意力。在本文中,我们关注基于激光雷达的3D对象检测的OOD输入的检测。我们制定了OOD输入对于对象检测的含义,并提议适应几种OOD检测方法进行对象检测。我们通过提出的特征提取方法来实现这一目标。为了评估OOD检测方法,我们开发了一种简单但有效的技术,用于为给定的对象检测模型生成OOD对象​​。我们基于KITTI数据集的评估表明,不同的OOD检测方法具有检测特定OOD对象​​的偏差。它强调了联合OOD检测方法的重要性以及在这个方向上进行更多研究。
translated by 谷歌翻译